Divergence on the mitochondria

As we showed in the other file, we can do away with nu-mts and the like by taking the majority allele at each site.

If we genotype the individuals this way, we get the following matrix of divergences, ordered by PC1:

dmat <- matrix(NA,nrow=272,ncol=272)
for (i in 1:272) { for (j in 1:272) { dmat[i,j] <- dmat[j,i] <- mean(major.allele[,i]!=major.allele[,j],na.rm=TRUE) } }
# dmat[order(pcs$PC1),order(pcs$PC1)]
rownames(dmat) <- colnames(dmat) <- colnames(major.allele)
hist(dmat[upper.tri(dmat,diag=FALSE)],breaks=100,xlab="divergence")

Excluding etort-50, which has mean divergence 0.0116628, the mean divergence is 0.0020987. The mean divergence between haplotypes is 0.0037559, which is comparable to the nuclear divergence of around 0.0047608.

Here’s a visual representation of this:

etid <- gsub(" (sheared2)","",gsub("etort-","",pcs$etort[order(pcs$PC1)]),fixed=TRUE)
# pdf(file="mt-divergence.pdf",width=6,height=18,pointsize=8)
plot(0,0,xlim=range(dmat),ylim=c(1,272),type='n',yaxt='n',xlab='mitochondrial divergence',ylab='')
for (k in unique(pc.plus.cols)) {
    axis(2,at=(1:272)[pc.plus.cols[order(pcs$PC1)]==k],labels=etid[pc.plus.cols[order(pcs$PC1)]==k],las=2,
         col=k, col.axis=k, cex.axis=0.75)
}
text(dmat[order(pcs$PC1),order(pcs$PC1)],row(dmat),labels=rep(etid,each=length(etid)),
     col=adjustcolor(pc.plus.cols[order(pcs$PC1)][col(dmat)],0.75), cex=0.5 )

# dev.off()

Geographic distribution of segregating sites

Let’s see what the geographic distribution of the SNPs on the mitochondria look like. First, we need to find them. Here are the allele frequencies and locations on the genome of the segregating sites, i.e. the ones where some tortoise’s major allele differs from the total major allele:

# for each tortoise, does their major allele match the overall major allele?
ismajor <- ( major.allele == total.major.allele )
segregating <- which( apply(!ismajor,1,any,na.rm=TRUE) )
segfreqs <- 1-rowMeans(ismajor[segregating,],na.rm=TRUE)
plot( segregating, segfreqs, xlab="position", ylab="minor allele frequency" )

There are 477 such sites, of which 58 are at minor allele frequency above 20%.

Here are maps of those 58 sites; alleles more common in northern tortoises than in southern ones are colored black.

northern.torts <- (pcs$PC1>0)
northern.freqs <- rowMeans( ismajor[segregating,northern.torts], na.rm=TRUE )
southern.freqs <- rowMeans( ismajor[segregating,!northern.torts], na.rm=TRUE )
northern.alleles <- ismajor[segregating,]
northern.alleles[(northern.freqs>southern.freqs),] <- !northern.alleles[(northern.freqs>southern.freqs),]
layout(t(1:3))
for (k in segregating[segfreqs>0.2]) {
    kk <- match(k,segregating)
    player(paste("site",k))
    points(coords, pch=19,
            col=adjustcolor(ifelse(northern.alleles[kk,],"blue","purple"),0.5), cex=1)
    # if (interactive() && is.null(locator(1))) break
}

Haplotypes

hap.scores <- colMeans(northern.alleles[segfreqs>0.2,])
north.hap <- (hap.scores>0.5)
# pdf(file="mt-hap-map.pdf",width=3,height=3,pointsize=10)
# par(mar=c(0,0,3,0)+.1)
player("mitochondrial haplotypes")
points(coords, pch=19,
        col=adjustcolor(ifelse(rownames(coords@coords)=="etort-50",NA,
                     ifelse(north.hap,"blue","purple")),0.5), cex=1)
points(coords['etort-50'],pch="*")

# dev.off()

The SNPs themselves

Here is an image of all alleles seen in at least three tortoises, with individuals ordered by their projection on PC1, and SNPs in order along the chromosome; blue is alleles common in the north but not the south; purple is an allele common in the south but not the north; and red is an allele common nowhere.

nall <- (northern.alleles[rowSums(northern.alleles,na.rm=TRUE)>2,])[,order(pcs$PC1)]
colmat <- matrix(ifelse(nall,"blue","purple"),nrow=nrow(nall),ncol=ncol(nall))
rare.alleles.1 <- (rowMeans(nall,na.rm=TRUE)<0.25)
rare.alleles.2 <- (rowMeans(nall,na.rm=TRUE)>0.75)
colmat[rare.alleles.1,] <- ifelse(nall[rare.alleles.1,],"red","grey")
colmat[rare.alleles.2,] <- ifelse(nall[rare.alleles.2,],"grey","red")
png(file="mt-haplotypes.png",width=5*288,height=2.5*288,res=288,pointsize=8)
par(mar=c(3,0,2,0)+.1)
plot( col(nall), row(nall), 
     main="mitochondrial haplotypes",
        col=adjustcolor(colmat,0.75),
        pch=15, xlab='', xaxt='n', yaxt='n', ylab='' )
axis(1, at=seq_along(tort.ids), labels=gsub(" (sheared2)","",tort.ids[order(pcs$PC1)],fixed=TRUE), cex=0.25, las=3, cex.axis=0.25 )
dev.off()
## png 
##   2

Trees

A fasta file of the sequences at the segregating sites has been written to mitochondria-segsites.fasta.

seqs <- read.phyDat("mitochondria-segsites.fasta",format="fasta",type="DNA")
treeNJ <- NJ(dist.dna(as.DNAbin(seqs)))
plot(treeNJ)

# fit <- pml(treeNJ,data=seqs)
# mltree <- optim.pml(fit,TRUE)